Wide-Scale Data Stream Management

نویسندگان

  • Dionysios Logothetis
  • Ken Yocum
چکیده

This paper describes Mortar, a distributed stream processing platform for building very large queries across federated systems (enterprises, grids, datacenters, testbeds). Nodes in such systems can be queried for distributed debugging, application control and provisioning, anomaly detection, and measurement. We address the primary challenges of managing continuous queries that have thousands of wide-area sources that may periodically be down, disconnected, or overloaded, e.g., multiple data centers filled with cheap PCs, Internet testbeds such as Planetlab, or country-wide sensor installations. Mortar presents a clean-slate design for best-effort innetwork processing. For each query, it builds multiple, static overlays and leverages the union of overlay paths to provide resilient query installation and data routing. Further, a unique data management scheme mitigates the impact of clock skew on distributed stream processing, reducing result latency by a factor of 8, and allows users to specify custom in-network operators that transparently benefit from multipath routing. When compared to a contemporary distributed snapshot querying substrate, Mortar uses a fifth of the bandwidth while providing increased query resolution, responsiveness, and accuracy during failures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Data Streams

DEFINITION A majority of today’s data is constantly evolving and fundamentally distributed in nature. Data for almost any large-scale data-management task is continuously collected over a wide area, and at a much greater rate than ever before. Compared to traditional, centralized stream processing, querying such large-scale, evolving data collections poses new challenges, due mainly to the phys...

متن کامل

Mortar: Towards Million-Node Data Stream Management

Compute federations represent a global pool of wellprovisioned sensors, continuously emitting system and application-specific data streams, that can be queried for distributed debugging, application control, anomaly detection, measurement and data sampling. Such queries may involve thousands of stream sources distributed across the wide area, implying that node and network failures will be comm...

متن کامل

Erosion Hazard Index Methodology (EHIM) for Streams Erodibility Assessment (Ardabil-Province)

An erosion hazard index methodology (EHIM) was developed for assessing stream erosion. The index of stream erosion is designed as a management tool. Assessing stream erosion involves consideration of a range of aspects of streams and a value judgment about a desirable state. The assessment of the erosion indicators of streams was based on a state-wide assessment of physical stream condition. A ...

متن کامل

On the Complexity of Multi-Query Optimization in Stream Grids

Stream grids are wide-area grid computing environments that are fed by a set of stream data sources. Such grids are becoming more wide-spread due to the large scale deployment of sensor networks for a wide range of applications, from monitoring geophysical activities to supply chain management coupled with applications like network monitoring. Queries external to the system arrive on any node i...

متن کامل

An introduction to Stream Data Management on Large Information Networks

In recent times there has been a surge of large scale information networks arising in various application domains, ranging from communication networks, cellphone call networks, social networks, email networks, road traffic networks, financial transaction networks, to name a few. In such applications there is a need to manage and process large data streams in near-real time. Examples of such que...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008